|
 |
If your answer is "it's not", you can probably stop reading now.
I've been thinking about the things that make Haskell interesting. Of
course, the main one is that it's a 100% functional language (as opposed
to an OOP/FP hybrid like Objective Camal), so if you want to see what
functional programming is like, Haskell is the place to go. But on
reflection, Haskell has several interesting and/or useful attributes
which actually have little or nothing to do with functional programming.
Haskell has automatic type inference, for example. You don't *need* a
functional language for that; you could totally do it in a normal
language. It's just that, historically, nobody has. (I guess the sort of
people who think about computer programs mathematically also tend to be
the sort of people who realise that things like type signatures can be
derived algorithmically.)
Everybody thinks that a programming language has to either have explicit
types (Pascal, C, C++, Java, Eiffel...) or no static typing at all
(Smalltalk, Tcl, JavaScript, every scripting language known to
mankind...) Haskell demonstrates that there is a Third Way: static
typing, but with the types implicit rather than explicit.
Obviously, there are times when either you NEED to be explicit - because
the correct answer is ambiguous, because you want improved performance,
whatever. There are also times when it's just "good style" to be
explicit; top-level functions should usually have type signatures. But
you don't have to write a signature for every single damned expression
and variable you use in the entire program; the compiler will figure it out.
Another thing Haskell has is parametric polymorphism. If you write some
code that makes data around, it shouldn't *matter* what the hell the
type of the data is. I mean, if you don't "do" anything with it other
than move it, what does it matter? So, for example, it should be
possible to write a function that joins two containers together,
regardless of what kind of element data they contain. (Obviously the
type of the containers themselves matters, but not the type of whatever
they contain.)
This pre-supposes that you have type-safe containers in the first place.
In most OO languages, you just create a container of "Object", which can
then contain anything. If you particularly want a container to contain,
say, only Customer objects, you usually have to implement this yourself,
by hand.
More recently, a couple of OO languages have introduced this feature
they call "generics". Eiffel is the first language I'm aware of that did
it, although I gather Java has it too now. What it allows you to do is
write a class parameterised by another class; the typical case being a
container class parameterised by the class of element it contains. (In
Eiffel, integers and floats are classes, unlike the Java brokenness.)
The difference is that Eiffel makes this seem like some highly complex
uber-feature that only a few people will ever need to use, in special
circumstances. Haskell makes it trivial. Defining a parametric type
requires barely a few keystrokes more than defining the same thing with
no parameters.
C++ has templates, and people often compare this to generics. I'm not
sure why. As far as I can tell, templates not even remotely similar to
generics.
- Generics are an extension to the type system which allow the type
checker to handle more sophisticated type signatures.
- Templates are, as best as I can tell, preprocessor macros on steriods.
They automate the process of creating a new, seperate, class for each
combination of parameter values. It's like copy & pasting each time you
want a new container class, except the template preprocessor automates
it for you. This is clearly very much more maintainable, but not the
same as extending the type checker.
It seems that templates can also be used to do things which are
impossible with generics (e.g., making the implementation *different*
for certain special cases). So I'm not sure why people compare them at
all, because they're so totally different.
Another thing Haskell has is algebraic data types. Other languages could
have this. It probably doesn't play well with the OO method though. In
fact, let me expand on that.
The OOP notion of classes, inheritance, substitution and so forth
basically assumes a model where you can always extend the current class
hierachy. For example, you initially decide what (conceptual) properties
every widget must have and write that in an abstract class. Thereafter,
you can always add new kinds of widget. (Or even specific kinds of
widget, of course.) It's a system that works fairly well.
Haskell with its algebraic data types assumes that the data you're
trying to model has a fixed, finite set of possibilities, which you can
describe once and for all. If that sounds stupid and limiting, consider
a binary tree. Every node in a binary tree is either a leaf node or a
branch node. You will never, ever, need to add a new kind of node. (If
you did, it wouldn't be a *binary tree*. It would be something else.)
So the data structure of a binary tree can be described once and for
all. However, the operations you can perform on it is extensible; you
can always add new operations.
Now consider the OOP case. You write an abstract "tree" class with
concrete "leaf" and "branch" subclasses. Each time you want to do
something new with the tree, you write an abstract method in the tree
class and add half of the implementation to each of the two concrete
subclasses.
In Haskell, you write one line of code which says "a tree is either a
leaf containing an item, or a branch containing two subtrees". You then
write as many functions as you like, which handle the leaf and branch
cases together, in one place (not split between three places).
For a binary tree, or an abstract syntax tree, or any number of other
simple, ridigly-defined data structures you could envisage, this
approach is actually much better. If you're writing a compiler or
interpretter, you don't *need* the abstract syntax tree to be
extensible. But you probably *do* need to be able to add new program
passes and code manipulation functions all the time. (And letting the
data structure be parameterised is probably Very Freaking Useful. Part
of the compiler might use strings for names, while another part might
use some more sophisticated structure representing fully-qualified
names, for example.)
Now let's flip this and look at widgets again. Using the ADT approach,
you'd need a giant "widget" type capable of representing any possible
widget. And functions that all know how to handle any possible widget.
And the code for handling a specific widget type is scattered between
all the widget-handling functions.
You're never ever going to need to add new functions for handling "any
possible widget" (although you might want one for handling a *specific*
kind of widget). But you very likely *will* want to add new types of
widget. But, uh, you can't.
In summary:
- If you have a limited number of structures which you want to process
in an unlimited number of ways, ADTs work best.
- If you have an unlimited number of structures which you want to
process in a limited number of ways, a class hierachy works best.
Haskell, of course, has to go one better: Haskell has what it calls
"classes", but really they're "interfaces". And you can add them after
the fact. Which means you can grab a bunch of unrelated data types
(possibly that you didn't even write) and create a single interface for
processing them all.
The one small glitch is that Haskell demands that all types are known at
compile-time. Which means that doing OO-style stuff where types are
changed dynamically at run-time is a little tricky. Haskell 98 (and the
newly-released Haskell 2010) can't do it, but there are
widely-implemented extensions that can. It gets kinda hairy though.
Married side by side with ADTs are case expressions. Now other languages
have case statements, but they tend to be quite feeble. In most
languages they only work on integers; I *think* Pascal lets you use 'em
on enumerations [but I'm not really sure]. And in C-like languages, you
can execute multiple alternatives at once. While this is no doubt
occasionally useful, most of the time it's just a source of
infuriatingly hard to find bugs (i.e., you left off the "break"
statement, so a whole bunch of code gets executed when you didn't want
it to).
In Haskell, case expressions work on *everything*. (Everything who's
internal structural details are in scope, anyway. These internal details
can be hidden from client code.) What is more, they use "pattern
matching". This powerful feature allows you to do things like determine
whether a tree has a particular shape and pluck several deeply-burried
substructures out of it, all in a single transparent 1-liner. For example,
case list of
[[x], [y,_], [z,_,_]] -> foo x y z
This checks whether "list" contains a list with exactly 3 elements, each
of which is a sublist of length 1, 2, 3 (respectively) - and, if all
this is the case, it fishes out the first element of each sublist and
feeds all the data to the "foo" function. All in one easily-readble line
of code. The alternative in Java would be something like
if (list.length == 3)
{
if (list.At(0).length == 1)
{
int x = list.At(0).At(0);
if (list.At(1).length == 2)
{
int y = list.At(1).At(0);
if (list.At(2).length == 3)
{
int z = list.At(2).At(0);
foo(x, y, z);
which is obviously a hell of a lot longer. What is more, if your Haskell
case expression has multiple cases in it (which it usually does),
because of the declarative form of the patterns, the compiler can
rearrange the code so that each test need only be performed once, and
the compiled code can proceed with the minimum number of branch
instructions. Such things are far, far harder to do with the
deeply-nested forest of if-statements above. (Not to mention that the
expressions in the if-statements might actually alter the program's
state, so changing their order may alter the program!)
Again, this probably doesn't gel too well with OOP, which states that
each data structure ("object") should be responsible for its own
internal substructure. In other words, you shouldn't *need* to fish out
deeply-burried data, the intervening objects in the chain should be
doing this for you. According to "the OO way", the only thing you should
ever do to objects is call methods on them, not inspect them for type
information or substructure.
But something like, say, JavaScript (which isn't really OO in the first
place) could certainly make use of ADTs and/or pattern matching, I think.
Another big deal about Haskell is how functions are first-class. The
fact that functions are pure makes this a really big win, but even
without pure functions, it's still useful. (Smalltalk implements it, for
example.) It doesn't immediately sound very interesting, but what it
enables you to do is write a loop once, in a library somwhere, and pass
in the "loop body" as a function.
A trivial example is a "fold". A fold takes some container (e.g., a
list) and repeatedly applies a binary function to it. The canonical
example is
sum = fold (+) 0
This says that to calculate the sum of a list, you fold it using the "+"
function, and with a starting value of 0. (This also means that the sum
of a completely empty list is 0.) It is obvious that
product = fold (*) 1
but there are plenty of other uses too. For example, the (++) function
joins to lists together, and
concat = fold (++) []
turns a list-of-lists into a single giant list. Other examples include
minimum = fold1 min
maximum = fold1 max
which find the minimum or maximum of a list using the binary "min" and
"max" functions. (The "fold1" function demands that the list be
non-empty, and doesn't require a start value.) I could go on all day,
but it turns out almost all uniform list operations can be implemented
as a fold of some kind.
Other languages have done this. Smalltalk is the obvious example. It has
#do: and #collect: which are approximately equivilent to Haskell's
"map", #select: and #reject: which are Haskell's "filter", and many
more. Of corse, being an object-oriented language, Smalltalk makes all
these methods work nicely for all container types, not just lists.
(Haskell hasn't copied this trick yet. At least, not in the *standard*
libraries. It's not difficult to change container, but it's tricky to
write code that works generically for any container.)
The other thing about first-class function is that in both Haskell and
Smalltalk, it is utterly trivial to spawn a new thread:
forkIO foo [foo] fork
What Smalltalk doesn't do, and Haskell does, is make combining small
functions into larger functions trivial as well. Haskell has anonymous
"lambda functions", it has the function composition operator, it has
curried functions, it has the dollar operator, and so on and so forth.
All of this *could* be put into Smalltalk, or any other OO language for
that matter. (Whether it would be efficient is a whole other matter...)
The final thing that's interesting about Haskell is the rate at which it
changes. Haskell (or rather, GHC) implemented Software Transactional
Memory 5 years ago. I'm not aware of any mainstream languages that have
this yet. GHC added Data Parallel Haskell 2 years ago, and the busy PhD
researchers are still hard at work getting it into a usable state.
(There's even an experimental GPU backend already.) Around the same
time, Type Families were added. (How long did it take Java to do
Generics?) The pace of change is staggering. Lots of new and very cool
stuff happens in Haskell first. (No doubt it will eventually all be
stolen by lesser languages so that people still don't use Haskell itself.)
Post a reply to this message
|
 |